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Performance  Metrics  and  Optimization 

How  are  performance  metrics  used? 

-  Sensitivity  studies 

-  System  design 

-  Decision  aid  for  strategic  planning 

-  Adapting  system  over  time 

-  Detecting  instability;  avoiding  unstable  performance 

-  Evaluating  system  reliability 

-  Design  of  experiments 

-  Mathematical  modeling  and  parameter  estimation 

-  And  on  and  on.... 

Most  of  above  involve  optimization 

Claim:  Impossible  to  have  a  performance  metrics 
conference  w/o  ser/ous/y  considering  optimization! 


Search  and  Optimization  Algorithms  as 

Part  of  Problem  Solving 

There  exist  many  deterministic  and  stochastic  algorithms 

Algorithms  are  part  of  the  broader  solution 

Need  clear  understanding  of  problem  structure,  constraints, 
data  characteristics,  political  and  social  context,  limits  of 
algorithms,  etc. 

“Imagine  how  much  money  could  be  saved  if  truly 
appropriate  techniques  were  applied  that  go  beyond  simple 
linear  programming.”  (Z.  Michalewicz  and  D.  Fogel,  2000) 

-  Deeper  understanding  required  to  provide  truly  appropriate 
solutions;  COTS  usually  not  enough! 

Many  (most?)  real-world  implementations  involve  stochastic 
effects 


Potpourri  of  Problems  Using  Stochastic 

Search  and  Optimization 

Minimize  the  costs  of  shipping  from  production  facilities  to 
warehouses 

Maximize  the  probability  of  detecting  an  incoming  warhead 
(vs.  decoy)  in  a  missile  defense  system 

Place  sensors  in  manner  to  maximize  useful  information 

Determine  the  times  to  administer  a  sequence  of  drugs  for 
maximum  therapeutic  effect 

Find  the  best  red-yellow-green  signal  timings  in  an  urban 
traffic  network 

Determine  the  best  schedule  for  use  of  laboratory  facilities 
to  serve  an  organization’s  overall  interests 


Two  Fundamental  Problems  of  Interest 


Let  ©  be  the  domain  of  allowable  values  for  a  vector  0 
0  represents  a  vector  of  “adjustables” 

-  0  may  be  continuous  or  discrete  (or  both) 

Two  fundamental  problems  of  interest: 

Problem  1.  Find  the  value(s)  of  a  vector  0  e  0 
that  minimize  a  scalar-valued  loss  function  L(0) 

—  or  — 

ProbJej-j-j  2.  Find  the  value(s)  of  0  e  0  that  solve  the 
equation  g(0)  =  0  for  some  vector-valued  function  g(0) 


Frequently  (but  not  necessarily)  g(0)  =  5L(0)/50 


Three  Common  Types  of  Loss  Functions 


Continuous 


Discrete/ 

Continuous 


Discrete 


Stochastic  Search  and  Optimization 

•  Focus  here  is  on  stochastic  search  and  optimization: 

A.  Random  noise  in  input  information  (e.g.,  noisy 
measurements  of  L(6)) 

—  and/or  — 

B.  injected  randomness  (Monte  Carlo)  in  choice  of 
aigorithm  iteration  magnitude/direction 

•  Contrasts  with  deterministic  methods 

-  E.g.,  steepest  descent,  Newton-Raphson,  etc. 

-  Assume  perfect  information  about  L(0)  (and  its  gradients) 

-  Search  magnitude/direction  deterministic  at  each  iteration 

•  Injected  randomness  (B)  in  search  magnitude/direction  can 
offer  benefits  in  efficiency  and  robustness 

-  E.g.,  Capabilities  for  global  (vs.  local)  optimization 
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Some  Popular  Stochastic  Search  and 
Optimization  Techniques 

Random  search 
Stochastic  approximation 

-  Robbins-Monro  and  Kiefer-Wolfowitz 

-  SPSA 

-  NN  backpropagation 

-  Infinitesimal  perturbation  analysis 

-  Recursive  least  squares 

-  Many  others 

Simulated  annealing 

Genetic  algorithms 

Evolutionary  programs  and  strategies 

Reinforcement  learning 

Markov  chain  Monte  Carlo  (MCMC) 

Etc. 


Effects  of  Noise  on  Simple  Optimization  Problem 


0  1  2  3  4  5  6  7 
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Example  Search  Path  (2  variables):  Steepest 
Descent  with  Noisy  and  Noise-Free  Input 
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Example  of  Noisy  Loss  Measurements: 

Tracking  Problem 


•  Consider  tracking  problem  where  controller  and/or  system 
depend  on  design  parameters  0 

-  E.g.:  Missile  guidance,  robot  arm  manipulation,  attaining 
macroeconomic  target  values,  etc. 


Aim  is  to  pick  9  to  minimize  mean-squared  error  (MSE): 
L{Q)  =  E I  Ijactual  output  -  desired  outputjp 


In  general  nonlinear  and/or  non-Gaussian  systems,  not 
possible  to  compute  L(0) 

Get  ot>se/ved  squared  error  y(0)  =  ||  •  |p  by  running  system 

Note  that  y(0)  =  ||  •  f  =  L(0)-i-  noise 
-  Values  of  y(0),  not  L(0),  used  in  optimization  of  0 
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Example  of  Noisy  Loss  Measurements: 
Simulation-Based  Optimization 

•  Have  credible  Monte  Carlo  simulation  of  real  system 

•  Parameters  0  in  simulation  have  physical  meaning  in  system 

-  E.g.:  0  is  machine  locations  in  plant  layout,  timing  settings  in 
traffic  control,  resource  allocation  in  military  operations,  etc. 

•  Run  simulation  to  determine  best  0  for  use  in  real  system 

•  Want  to  minimize  average  measure  of  performance  L(0) 

-  Let  y(0)  represent  one  simulation  output  (y(0)  =  L(0)  +  noise) 
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Some  Key  Properties  in  impiementation  and 
Evaiuation  of  Stochastic  Aigorithms 

•  Algorithm  comparisons  via  number  of  evaluations  of  L(0)  or 
g(0)  (not  iterations) 

-  Function  evaluations  typically  represent  major  cost 

•  Curse  of  dimensionality 

-  E.g.:  If  dim(0)  =10,  each  element  of  0  can  take  on  10  values. 
Take  10,000  random  samples:  Prob(finding  one  of  500  best  0) 
=  0.0005 

-  Above  example  would  be  even  much  Aiarderwith  only  noisy 
function  measurements 

•  Constraints 

•  Limits  of  numerical  comparisons 

-  Avoid  broad  claims  based  on  numerical  studies 

-  Best  to  combine  theory  and  numerical  analysis 
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Global  vs.  Local  Solutions 


•  Global  methods  fend  to  have  following  characteristics: 

-  Inefficient,  especially  for  high-dimensional  0 

-  Relatively  difficult  to  use  (e.g.,  require  very  careful  selection  of 
algorithm  coefficients) 

-  Shaky  theoretical  foundation  for  global  convergence 

•  Much  “hype”  with  many  methods  (genetic  algorithm  [GA] 
software  advertisements): 

-  “...can  handle  the  most  complex  problems,  including 
problems  unsolvable  by  any  other  method.” 

-  “...uses  GAs  to  solve  any  optimization  problem!” 

•  But  there  are  some  mathematically  sound  methods 

-  E.g.,  restricted  settings  for  GAs,  simulated  annealing,  and 
SPSA 
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No  Free  Lunch  Theorems 


•  Wolpert  and  Macready  (1997)  establish  several  “No  Free 
Lunch”  (NFL)  Theorems  for  optimization 

•  NFL  Theorems  apply  to  settings  where  parameter  set  >> 
and  set  of  loss  function  values  are  finite,  discrete  sets 

-  Relevant  for  continuous  0  problem  when  considering  digital 
computer  implementation 

-  Results  are  valid  for  deterministic  and  stochastic  settings 

•  Number  of  optimization  problems — mappings  from  to 
set  of  loss  values — is  finite 

•  NFL  Theorems  state,  in  essence,  that  no  one  search 
algorithm  is  “best”  for  all  problems 
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No  Free  Lunch  Theorems — Basic  Formulation 

•  Suppose  that 

Nq  =  number  of  values  of  0 

A/^  =  numberof  values  of  loss  function 

•  Then 

~  number  of  loss  functions 

•  There  is  a  finite  (but  possibly  huge)  number  of  loss 
functions 

•  Basic  form  of  NFL  considers  average  performance  over  all 
loss  functions 
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Illustration  of  No  Free  Lunch  Theorems 

(Example  1.7  in  ISSO) 

•  Three  values  of  0,  two  outcomes  for  noise  free  loss  L 

-  Eight  possible  mappings,  hence  eight  optimization  problems 

•  Mean  loss  across  all  problems  is  same  regardless  of  0; 
entries  1  or  2  in  table  below  represent  two  possible  L 
outcomes 


NMap 

©x 

1 

2 

3 

4 

5 

6 

7 

8 

01 

1 

1 

1 

2 

2 

2 

1 

2 

02 

1 

1 

2 

1 

1 

2 

2 

2 

03 

1 

2 

2 

1 

2 

1 

1 

2 
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No  Free  Lunch  Theorems  (cont’d) 


•  NFL  Theorems  state,  in  essence: 

Averaging  (uniformly)  over  all  possible 
problems  (loss  functions  L),  aU  algorithms 
perform  equally  well 


•  In  particular,  if  algorithm  1  performs  better  than  algorithm  2 
over  some  set  of  problems,  then  algorithm  2  performs  better 
than  algorithm  1  on  another  set  of  problems 

Overall  relative  efficiency  of  two  algorithms 
cannot  be  Inferred  from  a  few  sample  problems 


•  NFL  theorems  say  nothing  about  specific  algorithms  on 
specific  problems 
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Relative  Convergence  Rates  of  Deterministic 

and  Stochastic  Optimization 


•  Theoretical  analysis  based  on  convergence  rates  of 
iterates  where  k  is  iteration  counter 

•  Let  0*  represent  optimal  value  of  0 


•  For  deterministic  optimization,  a  standard  rate  result  is: 


0{c^),  0<c<1 


Corresponding  rate  with  noisy  measui^ments 


0^-0 


* 


=  0 


f— 1 

v/c^  y 


,  o<'k<y2 


•  Stochastic  rate  inherently  slower  in  theory  and  practice 
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Concluding  Remarks 

Stochastic  search  and  optimization  very  wideiy  used 

-  Handies  noise  in  function  evaiuations 

-  Generaiiy  better  for  giobai  optimization 

-  Broader  appiicabiiity  to  “non-nice”  probiems  (robustness) 

Some  chaiienges  in  practicai  probiems 

-  Noise  dramaticaiiy  affects  convergence 

-  Distinguishing  giobai  from  iocai  minima  not  generaiiy  easy 

-  Curse  of  dimensionaiity 

-  Choosing  aigorithm  “tuning  coefficients” 

Rareiy  sufficient  to  use  theory  for  standard  deterministic 
methods  to  characterize  stochastic  methods 

“No  free  iunch”  theorems  are  barrier  to  exaggerated  ciaims  of 
power  and  efficiency  of  any  specific  aigorithm 

Aigorithms  shouid  be  impiemented  in  context:  “Better  a 
rough  answer  to  the  right  question  than  an  exact  answer 
to  the  wrong  one”  (Lord  Kelvin) 
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Contact  Info,  and  Related  Web  Sites 

•  james.spall@jhuapl.edu 

•  www.jhuapl.edu/SPSA  (Web  site  on  stochastic 
approximation  algorithm) 

•  www.jhuapl.edu/ISSO  (Web  site  on  book 
Introduction  to  Stochastic  Search  and  Optimization) 
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